Pharmaceutical Step-Therapy Interventions: A Critical Review of the Literature

BACKGROUND: Adoption of step therapy (ST) is quickly outpacing the market's understanding of its clinical, humanistic, and economic outcomes. The broad scope of previous reviews of drug management programs has prohibited an in-depth discussion of the ST literature specifically. OBJECTIVES: To conduct a critical review of ST program evaluations, discuss their policy implications, and provide recommendations for future research. METHODS: PubMed was searched for relevant English-language articles, and references of relevant articles were examined. The ST policy under evaluation had to require use of a first-line agent prior to coverage of a second-line agent. RESULTS: Fourteen evaluations of ST programs have been published, 7 in commercial populations and 7 in Medicaid. Twelve of the studies empirically examined claims data; 1 was a model; and 1 was limited to patient surveys. Five therapy classes, including antidepressants, antihypertensives, antipsychotics, nonsteroidal anti-inflammatory drugs (NSAIDs), and proton pump inhibitors (PPIs), have been evaluated. The research has consistently found statistically significant drug cost savings with the exception of antipsychotics, where rebates have frequently been excluded. Savings result from greater use of first-line medications and from reduced medication initiation, with the magnitude of non initiation varying across therapy classes. Three studies have examined medication adherence, producing mixed results. Five studies have empirically examined the effect of ST on hospitalization and emergency room utilization and costs, with none finding statistically significantly higher disease-related utilization or spend, outside of higher outpatient expenditures but not higher outpatient utilization in 1 study. CONCLUSIONS: The research demonstrates that ST programs for therapy classes other than antipsychotics can provide significant drug savings through the greater use of lower-cost alternatives and, to a lesser extent, reduced drug utilization. The drug savings and clinical impact of ST for antipsychotics are unclear given the research conducted to date, but ST programs for NSAIDs and PPIs can provide significant drug savings without increasing use of other medical services. The research on ST shows gaps in the breadth of evaluation and methodological quality as well as possible study bias. Further research on ST is needed for other therapy classes and for the Medicare Part D population. Recommendations for other areas of research, needed methodological improvements, and reducing the potential for study bias are provided.

• The research on ST shows gaps in the breadth of evaluation and methodological quality as well as potential study bias. • Further research on ST is needed for numerous therapy classes where ST is common and for the Medicare Part D population.
Research is also needed to better understand the impact of ST on treatment discontinuities, appropriateness of use, other medical costs, and member/provider satisfaction as well as the effect of alternative designs. • Needed methodological improvements include (a) use of appropriate comparator groups, (b) examination of disease-related medical spending, (c) adjustment for multiple statistical tests, and (d) better causal linkage between program and outcomes. • To help reduce the potential for study bias, independently funded evaluations and mandatory study registration prior to initiation should be considered.

C O N T E M P O R A R Y S U B J E C T
W ith the growing availability of generic and lowercost brand alternatives, step therapy (ST) has grown rapidly in popularity in commercial, Medicare, and Medicaid settings as a means to create better value from rising pharmaceutical spend. ST requires a member to try 1 of the first-line medications, often a generic alternative, prior to receiving coverage for a second-line agent, usually a branded product. Nearly 60% of commercial payers reported having 1 or more ST programs in 2010, now making it one of the most popular management tools. 1 In a 2007 report on Medicare drug benefits, 7 of the top 10 largest Part D plans had 1 or more ST programs, with the average being 6 programs. 2 However, adoption of ST is quickly outpacing decision makers' understanding of the clinical, humanistic, and economic value of these programs. Such knowledge is needed to avoid potential unintended consequences, such as medication noncompliance. 3,4 Conversely, concerns over unintended consequences can prevent plan sponsors from adopting programs that evidence later shows to be robust, resulting in wasted dollars with no incremental health benefit. While systematic reviews of drug cost management programs have been conducted previously, the broad scope of these reviews has prohibited an in-depth discussion of the ST literature specifically. [5][6][7] Accordingly, the purpose of this paper is to conduct a critical review of ST program evaluations and to provide recommendations for future research.

■■ Methods
To identify potential studies for inclusion, PubMed was searched for relevant English-language articles using these subject terms: step therapy, prior authorization (PA), drug policy, and pharmaceutical policy. The reference lists of known studies and review articles were examined to find articles as well. To be considered eligible for review, the policy under evaluation had to be implemented in the United States and require use of a first-line agent prior to coverage of a second-line agent, a hallmark characteristic of ST that distinguishes it from traditional PA. However, ST programs typically allow patients to seek coverage for the second-line agent without having used a first-line medication, utilizing a medical exception process; thus, ST and PA programs overlap in their designs. Three studies that examined policy changes across multiple Medicaid programs but did not separate ST from PA programs in the results were excluded from the review. [8][9][10] ■■ Results Fourteen evaluations of ST programs have been published to date. [11][12][13][14][15][16][17][18][19][20][21][22][23][24] Seven of the studies were conducted in commercial populations (Table 1), while 7 assessed Medicaid ST programs ( Table 2). An analysis by Panzer et al. (2005) was an economic model of an antidepressant ST program for patients with anxiety, 15   surveyed members with an ST edit/ reject to assess their responses to the edit. 17 The remaining 12 studies were retrospective, claim-based analyses, 7 of which examined drug expenditures only, and 5 of which examined drug and medical expenditures. Therapy classes examined include antidepressants, antihypertensives, antipsychotics, nonsteroidal anti-inflammatory drugs (NSAIDs), and proton pump inhibitors (PPIs).
Antipsychotic ST in Medicaid has been the most heavily studied subject. Researchers at Harvard published a series of closely related papers examining the effect of ST programs for antipsychotics in Maine. [18][19][20] Although the titles of these 3 stud-ies referred to them as PA programs, they actually met the criteria for ST outlined previously. 18 For patients new to therapy with selected second-generation antipsychotics, prescribers had to provide evidence that a patient had not been adequately controlled by preferred agents. Anticonvulsants were placed on PA. Lu et al. (2010) found that drug treatment initiation for bipolar illness decreased by 32% after implementation of the ST and PA programs in Maine, primarily due to a reduction in the use of nonpreferred agents without a corresponding increase in the use of preferred agents. 18 Zhang et al. (2009) found that the same program resulted in an average $27 reduction per patient in pharmacy reimbursement for bipolar disorder during the 8-month policy period. 19 However, the hazard rate of treatment discontinuation was 2.28 times as high in the post-policy period than in the pre-policy period, suggesting that the savings resulted from treatment discontinuation rather than switching to lower-cost alternatives.  examined the impact of the same policy on antipsychotic users with a diagnosis of schizophrenia. 20 Patients initiating atypical antipsychotics during the ST program had a 29% greater risk of treatment discontinuity (i.e., gap, switch, or augmentation) than patients initiating before the ST program, whereas no change was observed in the comparison group over the same time period. The authors did not observe antipsychotic drug savings but acknowledged that they could not account for increased rebates from pharmaceutical manufacturers. Law et al. (2008) also examined the effect of an ST program for selected second-generation antipsychotic agents in Texas and West Virginia. 21 Both states grandfathered prior users and required a trial with a preferred agent before coverage of a nonpreferred agent. Both states observed reductions in the market share of nonpreferred antipsychotics but neither state demonstrated significant savings in pharmacy spend, not factoring in manufacturer rebates.
In contrast, Farley et al. (2008) found about $7 million in drug cost savings for Georgia Medicaid after implementing ST for atypical antipsychotic medications compared with a state Medicaid program without ST. Georgia Medicaid required patients to try 2 typical antipsychotics before receiving coverage for an atypical antipsychotic. The study by Farley et al. is the only analysis of medical expenditures following an antipsychotic ST program. 22 The authors did not find increases in disease-related inpatient or long-term care expenditures but did find an increase in disease-related outpatient medical expenditures ($32 per member per month [PMPM]). To further understand the outpatient cost finding, the authors examined disease-related outpatient utilization and found no significant increase in the ST group compared with the non-ST group. 22 For other therapy classes, research has found that ST produces drug savings. Three studies have examined antidepressant ST. 11,14,15 Dunn et al. (2006) found a 9.0% lower drug cost per day ($0.36 PMPM) for patients in an antidepressant ST  23 Smalley et al. (1995) found a decrease of 53% in expenditures for NSAIDs after implementation of ST for brand NSAIDs without grandfathering in Tennessee Medicaid. 24 Accordingly, other than studies of antipsychotics within Medicaid populations, the only studies that have not found sustained drug savings from ST are those that examined total prescription drug costs for patients enrolled in ST rather than disease-specific drug costs. 11,12 Research shows that drug cost savings result from greater use of first-line medications and also from less medication use as evidenced in the claims data. Using patient self-report, Motheral et al. found that 17% of patients received no medication after the ST edit, and 16% paid full price for the brand medication. 16 Using similar methodology for PPI and NSAID users, Cox et al. found that 11% of patients received no medication, 8% purchased an over-the-counter (OTC) alternative, and 11% received a sample for the brand drug from their physicians. 17 In both studies, the percentage of patients receiving no medication was highest for PPIs. Yokoyama et al. found that 7% of patients did not have an antihypertensive claim in the 12 months following the ST edit. 13 15 Also looking at antidepressants, Mark et al. (2010) reported an initial savings of 1.7% across total prescription drug costs (not just antidepressant spend) but found that the savings diminished over time. 11 The authors did not report expenditures for antidepressant medications.
Two evaluations of antihypertensive ST have been published. 12,13 Yokoyama et al. (2007) found a 12.8% lower cost per day for patients in an angiotensin II receptor blockers (ARB) ST program versus a comparison group. 13 An accompanying editorial in JMCP questioned whether Yokoyama et al. underestimated the drug savings from the sentinel effect, since the rate of initiation of either angiotensin-converting enzyme (ACE) inhibitor or ARB therapy was 2.4 times higher in the comparison group. 25 Mark et al. (2009) examined total prescription drug expenditures for antihypertensive users (i.e., not just antihypertensive spend) and reported that overall prescription drug spending initially declined 3.1% after program implementation, but the savings did diminish somewhat over time. 12 The authors did not report changes in antihypertensive drug spending. Motheral et al. (2004) found drug savings of $0.93 PMPM following implementation of an ST program for PPIs, NSAIDs, and selective serotonin reuptake inhibitors (SSRIs). 16     Step-Therapy Interventions: A Critical Review of the Literature augmentation with an ARB was 6.1%, similar to that of the comparison group. 13 Other than studies of antipsychotics, 4 studies have empirically examined the effect of ST on inpatient, emergency room (ER), and outpatient measures. Mark et al. reported higher allcause inpatient, ER, and outpatient expenditures after implementation of an ST program for antidepressants, adjusting for baseline differences in costs and demographics. 11 However, the authors failed to report baseline medical utilization or spending to allow for assessment of the comparability of the 2 groups. Calculation of unadjusted average all-cause medical and pharmacy costs showed an average $1,967 annual cost per patient for ST plans in the year after implementation of ST (2006), 6.9% lower than the average for the comparison plans without ST for antidepressants. The authors also reported nonsignificant increases as meaningful differences for mental health-related utilization. Specifically, mental health-related inpatient admissions had a generalized estimating equation (GEE) coefficient of 0.184 and P value of 0.15, and mental health-related ER visits had a GEE coefficient of 0.185 and P value of 0.13 ( Table 4 in the study report), but the abstract reported that "overall and mental health-specific inpatient and ER utilization and costs increased." 11 As to costs, despite the mention of increased mental health costs in the study abstract, no multivariate model of mental health-related medical expenditures was found in the article, only a model for all-cause expenditures (Table 5 and Figure 1 in the study report). Finally, during the time period of the study by Mark et al., 11 many employers and health plans subsequent antisecretory claim. 23 In both studies, the extent to which the patients received samples/OTCs or paid full price for the second-line brand medication is unknown.

Published Evaluations of Step-Therapy Programs in Commercial Populations (continued)
Other than studies of the atypical antipsychotics, only 3 studies have empirically assessed the impact of ST on longer-term medication adherence. Mark et al. found a 3 percentage point higher rate of antihypertensive medication discontinuation in an ST group (13%) versus a comparison group (10%). 12 Among antihypertensive users, days supply of antihypertensives was 7.9% lower for the ST group than the comparison group immediately after ST implementation, but this difference disappeared by 5 quarters after implementation. 12 Interestingly, the same pattern occurred for total number of prescriptions per antihypertensive user. For antidepressants, Mark et al. found 3.9% lower days supply immediately following ST implementation, but by 4 quarters after implementation, the antidepressant days supply was higher in the ST group than in the comparison plans without ST for antidepressants. 11 Dunn et al. also found no significant differences in days supply of antidepressants following implementation of ST. 14 In terms of switching or augmentation, Yokoyama et al. found that 24.0% of patients in the intervention group who applied for an ARB but were given ACE inhibitor monotherapy switched to or added an ARB within 12 months, a higher switch percentage than was observed for patients initially started on an ACE inhibitor in the comparison group (7.2%). However, for all the intervention group patients who initiated an ACE inhibitor, the 12-month rate of switching or

Published Evaluations of Step-Therapy Programs in Medicaid Populations (continued)
were actively adopting new strategies to improve Health Plan Employer Data and Information Set (HEDIS) scores for depression treatment, including initiation of antidepressant therapy. 26 The authors did not indicate whether the treatment or comparison group had such programs in place during this time period that could have affected the study results.
In a study with similar design for antihypertensive ST, Mark et al. did not find differences in all-cause inpatient or outpatient expenditures but did find higher all-cause inpatient, outpatient, and ER utilization. 12 The authors did not examine either disease-related (cardiovascular) utilization or expenditures in this earlier study.
In Georgia Medicaid, Delate et al. found that enrollees who received a histamine-2 receptor antagonist (H2RA) or no antisecretory drug following an ST edit were no more likely to incur greater gastrointestinal (GI)-related or total medical care expenditures than were enrollees who received a PPI. 23 Nonusers following a PPI PA edit were less likely than patients who received a second-line PPI medication to have any GI diagnosis (48% vs. 81%, respectively, P < 0.001), suggesting at least some channeling of patients to the appropriate alternative. 23 Smalley et al. found no increase in musculoskeletal-related Medicaid expenditures for nondrug services following implementation of an ST program for brand NSAIDs. 24 Neither study had an external control group, but both used within-group comparators to account for underlying market trends.
Panzer et al. used a model to simulate medical costs for antidepressant ST for patients with anxiety, with or without depression. 15 The probability that each patient would experience continuous treatment for 180 days or longer, discontinue treatment early, or have a therapy change (defined as either a switch or augmentation) was determined from previous descriptive literature (which the authors could not access for review-1 study being published in a magazine not Medlineindexed and the other as an abstract only). Drug use was then assigned to each patient based on a previous study, with patients having less than 180 days of therapy being assigned $5,223 in medical costs, those with 180 days or more being assigned $4,102 in costs, and those with a therapy change being assigned $6,741 in medical costs.
The model simulated a net $0.06 PMPM increase in allcause total medical costs. The key limitation of this model is the choice of study used for the medical cost assumptions, which was a cross-sectional examination of the association between drug use patterns for antidepressants and medical costs. Results from such cross-sectional studies cannot be translated into causal conclusions about interventions due to the presence of the healthy adherer effect, the tendency of people who are adherent to their medications to also engage in other healthy behaviors, such as exercising regularly and eating a healthy diet. 27 The presence of the healthy adherer effect leads to cross-sectional studies showing that higher rates of medication adherence are associated with better outcomes and lower health care costs with effect sizes far beyond what evidence from randomized controlled trials (RCTs) would suggest. Thus, the attempt to make causal inferences about ST from a cross-sectional study of compliance and medical costs is a fatal flaw and prohibits making any conclusions from the model by Panzer et al. Finally, the number of studies of humanistic outcomes, such as patient satisfaction, is also quite limited. Across PPIs, SSRIs, and NSAIDs, Motheral et al. found that compared with receipt of a generic, paying out of pocket for the brand drug (odds ratio of 0.25; P < 0.05) and receiving no medication (odds ratio of 0.12; P < 0.01) were associated with significantly lower satisfaction with the pharmacy benefit. 16 Cox et al. did not find a statistically significant relationship between outcome of the ST edit and pharmacy benefit satisfaction but did find that patients who received a covered medication other than the brand were less satisfied with medication they received than those who received the brand drug (P < 0.001). 17

■■ Discussion
The research demonstrates that ST programs for therapy classes other than antipsychotics can provide significant drug savings through the greater use of lower-cost alternatives and to a lesser extent, reduced drug utilization. As expected, programs that do not grandfather have shown the largest savings, but elimination of grandfathering will not be appropriate for all therapy classes and will risk increased member and provider dissatisfaction. While substitution with lower-cost alternatives is the primary objective of an ST program, reduced drug utilization is not an intended goal, the exception being those therapy classes where appropriate OTC substitutes are available. Depending on the therapy class, research has found that between 7% and 22% of patients have no prescription claim submitted to their insurance provider following an ST edit. 13,23 Some nonutilization in the prescription claims actually reflects OTC purchases or other responses that may be clinically appropriate. 16,17 However, survey research has shown that a small percentage of nonutilization remains even after accounting for these other behaviors, at least for the therapy classes studied to date, including NSAIDs, PPIs, and antidepressants. In the first 2 classes, nonutilization may have little if any clinical impact on the patient given the less severe indications for which these drugs are sometimes used, and the extent of true nonutilization has not been examined for therapy classes, the exception being antipsychotics, where the potential clinical impact may be more significant.
At least 1 pharmacy benefit management company (PBM) has developed a program to reduce the likelihood of nonuse after ST intervention. This program identifies patients who have not filled a prescription claim within 2 days following their ST edit and then notifies the patient and provider about the ST program, explaining medication alternatives and the steps for obtaining a PA. 28 Based on unpublished RCTs, the addition of a follow-up letter has improved medication initiation rates, increased use of generics, and reduced PAs for brand medications. 29 Notably, the Centers for Medicare and Medicaid Services (CMS) requires a similar notice to be mailed within 3 business days to each Medicare Part D beneficiary who receives a 1-time transition fill for a drug subject to ST, PA, or nonformulary edits. 30 Greater adoption of this type of tool, for patients with and without transition fills, would likely help to improve appropriate utilization associated with ST programs and increase member satisfaction.
For antipsychotics, the inconsistent findings for drug savings across studies likely reflects the different program designs and the lack of accounting for rebates where relevant. Georgia Medicaid found significant savings for a program that required use of 2 typical antipsychotics, which are available in generics and are far less expensive than atypicals. The other studies all had 1 or more atypical antipsychotic on the preferred drug list, reducing the potential savings, particularly when savings from rebates are not included. In addition, unlike other antipsychotic programs evaluated, Georgia Medicaid did not grandfather prior users, which also would have contributed to the greater observed savings. Antipsychotic ST in Medicaid enrollees with bipolar disorder or schizophrenia has been associated with reductions in treatment initiation and continuation, unintended consequences that may have contributed to the withdrawal of these policies in Maine and Georgia. In these studies, risperidone was the preferred atypical antipsychotic, except for Georgia which had no atypical on the preferred list. [18][19][20][21][22] Each atypical medication's efficacy and side effect profile may or may not make it a good choice for initial therapy in any one patient, 31 perhaps contributing to the reduced initiation observed in Maine. However, the extent to which the observed treatment discontinuities result in increased use of other medical services is unclear, since Farley et al. found no increase in inpatient expenditures, ER expenditures, or outpatient utilization for an ST program that considered typicals as first-line and appeared not to grandfather prior users. 22 Furthermore, in the Vermont Medicaid program, rescission of a PA exemption for new users of antipsychotics, antidepressants, and anxiolytics/sedatives was not followed by decreased utilization of medications on the PA list, and mental health-related hospitalizations declined following removal of the PA exemption. 32 As none of the studies that found increased treatment discontinuities examined other medical utilization or clinical outcomes, it is unclear whether the potentially inconsistent findings are due to differences in program design or a lack of short-term link between decreased medication use and use of other medical services. If the latter, are the observed treatment discontinuities leading to negative effects on important clinical outcomes that do not necessarily manifest in claims-based measures of utilization?
Given the conflicting findings to date, future research should assess whether ST for antipsychotics creates unintended adverse clinical consequences by increasing therapy discontinuities early in treatment, examining variations across specific disease populations and program designs. 33 No direct implications can be drawn from this research for the use of ST to address the broader use of antipsychotics in the commercially insured population given that rates of off-label use are high, 34 and commercially insured patients may be less vulnerable to the administration requirements of ST. As part of its comparative effectiveness research program, the Agency for Healthcare Research and Quality (AHRQ) concluded in 2007 that there was "insufficient high-grade evidence to reach conclusions about the efficacy" of atypical antipsychotics (olanzapine, quetiapine, risperidone, ziprasidone) for off-label uses such as behavioral problems in dementia, depression, obsessive-compulsive disorder, posttraumatic stress disorder, and personality disorders. 35 For PPIs and NSAIDs, the research indicates that ST, even without grandfathering, can reduce drug expenditures without increasing use of other related medical services, making these therapy classes low-hanging fruit for increased management for plan sponsors who have yet to implement ST. Beyond these limited comments, no other definitive conclusions can be made about ST programs related to quality of care, humanistic outcomes, or medical expenditure offsets due to the lack of data on key outcomes and the critical limitations of some of the published work. Recommendations for future research fall into 3 areas: (a) research topics, (b) methodological considerations, and (c) publication bias.

Research Topics
ST research published to date has examined only a small portion of the therapy classes, outcome domains, and populations that warrant evaluation (Table 3). Specifically, there is a need for further research on ST programs in each of the following areas: Evaluation of ST programs in other populations and therapy classes, such as antihyperlipidemics, antiasthmatics, antidiabetics, and multiple specialty therapy classes. ST programs now exist for dozens of therapy classes with adoption continuing to grow. For example, in 2010, nearly 40% of employers had implemented ST for antihyperlipidemics, yet no evaluations of this therapy class have been published. 1 Research in the Medicare population is also needed as price sensitivity and indications for use can vary.
Improved understanding of the prescription drug savings from step therapy, including an examination of the net savings of ST after program fees and rebates (when applicable), an understanding of how the savings change over time as physicians become familiar with the program (i.e., sentinel effect), and the extent to which savings are driven by greater use of generics versus OTC use or nonutilization.
Evaluations of the effect of step therapy on treatment discontinuities, including medication switching and adherence in clinically relevant therapy categories. To date, treatment discontinuities have been examined in a minority of studies, producing mixed results. Are patients who switch to the firstline agent more likely to prematurely discontinue therapy or to switch to an alternative medication? If so, are these discontinuities driven by real or perceived differences in effectiveness? Alternatively, do lower copayments for generic drugs lead to higher medication adherence in ST?
Examination of the effect of step-therapy programs on the appropriateness of use. The extent to which patients are being channeled to the appropriate alternative has received little attention outside of the work of Delate et al. 23 For statins, are patients at higher risk for cardiovascular events more likely to receive a PA for a higher dose brand medication and less likely to not initiate therapy? The definition of appropriate use will vary by therapy class and could include comparisons of use by indication, severity, and/or based on clinical guidelines.
Evaluations of the effect of step-therapy programs on clinical outcomes and use of other medical services, including hospitalizations and ER visits. The current research on other medical utilization is limited in quantity and plagued by questionable methodology. This research should also include relevant safety-related hospitalizations, since research in Canada found that more restrictive ST for cyclooxygenase (COX)-2 inhibitors was associated with a lower rate of hospital admission due to GI bleed. 36 While ST evaluations have often been conducted by PBMs that may lack medical data, health plans routinely have access to the medical data. In the future, examination of clinical outcomes beyond those obtained by claims Pharmaceutical Step-Therapy Interventions: A Critical Review of the Literature placed on it, creating additional administrative burden and having questionable impact on quality of care. 40,41 Whether physicians will view these automated tools and edits for ST as a timely and efficient addition to their practices to support provision of high quality and cost-effective care or, alternatively, distort the system by learning and reporting the clinical criteria that produce an approval is an open question.

Methodological Improvements
The suggestions for methodological improvements are not unique to ST evaluations nor do they represent an exhaustive list of good research practices. Rather, the recommendations highlight important areas for improvement based on the ST evaluations published to date.

Inclusion of appropriate comparator groups.
Given that the effect of an ST program on drug costs or medical expenditure is likely to be small as a percentage of total expenditure (i.e., small effect size), it is critical to use comparison groups that are similar at baseline on the key outcomes, key drivers of utilization and expense (e.g., age), and benefit design (e.g., copayment amounts). In both studies by Mark et al., baseline comparisons on key variables, such as drug spending and disease-related medical expenditures, were not reported. 11,12 Although the authors controlled for baseline measures in the statistical analyses, statistical controls are not always sufficient as noted by Fairman and Curtiss (2008), and baseline differences in known measures may signal differences in unknown factors. 42 Examination of disease-related drug and medical spend. Mark et al.'s (2009Mark et al.'s ( , 2010 use of all-cause drug or medical spend as an outcome measure is a concerning methodological approach because examination of all-cause expenditures creates greater risk for spurious findings and confounding due to other variables. This concern is magnified by the lack of reporting of relevant baseline utilization and spending. It is critical to examine disease-related medical spending when assessing medical expenditure offsets to establish a logical causal pathway between ST implementation and medical outcomes. 43 While sensitivity analysis can examine all-cause medical spend because of uncertainties in diagnostic coding, the primary endpoint should be disease-related drug and medical spending.
Adjustment for multiple statistical tests. ST evaluations have historically conducted dozens of statistical comparisons, increasing the risk of a significant finding merely due to chance. This problem is compounded by the large sample sizes, which can make the most trivial of differences statistically significant. Accordingly, it is appropriate to adjust for the multiple statistical tests by modifying the alpha required for statistical significance, 44,45 which generally has not been done in published ST evaluations.
Better causal linkage between program and outcomes. In data will be important as research expands into conditions where claims-based measures may not exist and/or lack the sensitivity to detect clinically meaningful changes in selected conditions, such as rheumatoid arthritis.
Insights into patients' and physicians' understanding and satisfaction with step therapy. Early research found that commercially insured patients perceived ST edits to be the same as PAs and were not aware that lower cost therapeutic alternatives existed, leading to unnecessarily high rates of PA and nonutilization. 29 Such misperceptions may be even greater among Medicaid and Medicare populations, which only serves to reduce program performance and increase dissatisfaction among beneficiaries. Given that concerns over member disruption and dissatisfaction are perhaps the biggest barriers to adoption of ST programs and that research is very limited, this is a valuable area for further research.
Examination of alternative step-therapy designs. ST programs can vary in terms of their extent of coverage and approval process. Two coverage variations that warrant evaluation are (a) programs that do not grandfather prior users of the second-line alternative, a feature that may grow for selected therapy classes where clinically questionable use is rampant, and (b) programs that cover OTC alternatives when available. Such design variations have the potential to significantly affect not only drug savings, as research in Medicaid has already shown, but also treatment discontinuities, patient satisfaction, and administrative burden for patients and providers. As Curtiss previously highlighted, the restrictiveness of the PA process can also vary based on the PA exception criteria and the approval process (e.g., phone versus fax), and more research is needed to understand how these variations affect program outcomes. 25 Even the most common criterion, prior use of a first-line agent, may not always be automated in the claims data, 37 but evaluations of nonautomated interventions have been limited to Medicaid populations.
In addition, at least 1 PBM has reported use of "smart edits" for ST that link to the patient's medical history to check for diagnoses in PA criteria. 38 This process allows for automatic coverage of second-line medications for patients with clinical indications that meet the PA criteria, such as automated coverage of a statin for patients with evidence of a previous myocardial infarction in the claims data, 39 or automated coverage of an antipsychotic for patients with a previous diagnosis of schizophrenia or bipolar disorder. An understanding of the incremental cost, savings, and quality of care from an integrated program is an important area of inquiry. Lastly, as physician adoption of e-prescribing and electronic medical records (EMRs) grows-allowing for instant awareness of and response to an ST edit-evaluations of real-time ST will be needed. To date, research has found that health information technology, such as EMRs, does not always live up to the high expectations assessments and has been suggested for studies of prescription cost-sharing. [48][49][50] While pharmaceutical funding does not necessarily indicate bias, it is important to conduct independently funded assessments of commercial ST programs. Of course, the potential for bias is not limited to pharmaceutical manufacturers. PBMs and health plans could be biased to demonstrate that these programs save money without affecting quality of care. However, no research has formally assessed the presence of bias among these health care stakeholders.
Study registration prior to initiation and mandatory publication. Registration of study protocols prior to study initiation and mandatory publication of full study results after study completion can help mitigate the potential for reporting bias. This idea was discussed for decades with regard to clinical trials, and in 2005, the International Committee of Medical Journal Editors implemented a requirement for clinical trial registration before patient enrollment. While the policy was criticized as burdensome and stifling of competition, the number of registered trials at ClinicalTrials.gov, the largest trial registry at that time, grew from 13,153 before the policy to include 67,000 trials as of January 2009. 51 Currently, observational studies, such as ST evaluations, do not require registration. While this approach is not without its challenges, mandatory study registration is beginning to receive serious consideration for the broader set of observational research. 52

Limitations
First, this study was limited to ST evaluations published in the United States. Second, while every attempt was made to identify ST evaluations even if they were identified as PA programs in the title, relevant research could have been missed. Similarly, ST and PA are not entirely distinct benefit tools, and while this study made every attempt to include only those programs that were primarily designed as ST, programs could have been misclassified. Third, the study did not use formal or quantitative approaches to assess publication bias; rather, conclusions reflect inferences made by the study author.

■■ Conclusions
The popularity of step therapy among commercial, Medicaid, and Medicare plans is no doubt due to the wide availability of generic alternatives that offer significant savings, the strong clinical evidence that typically underlies these programs, and their ability to affect only new users, thereby minimizing member disruption. However, evaluations of ST programs have not kept pace with their growing use in the public and private sectors. An expanded research agenda is needed to better understand the economic, clinical, and humanistic outcomes of ST programs. In parallel, improved methodological approaches and greater use of established strategies for reducing study bias are warranted. the few studies that examined medical spending, the research has been challenged by the lack of linkage between medical expenditures and medication use patterns. One would hypothesize that other medical expenditures would increase the most among noncompliers and, in particular, noncompliers with a diagnosis indicating greater severity or need (e.g., users of antidepressants for major depression versus nonspecific pain), yet prior research has not been conducted at that level of specificity. Given the quasi-experimental and sometimes observational nature of these evaluations and the multitude of potential confounders, such specificity is necessary to establish the causal linkage between the program and outcomes observed.

Study Bias
Study bias can take many forms, including publication bias (i.e., selective publication of research findings, depending on the nature and direction of the results), multiple publication bias, or outcomes reporting bias. 46 Outcomes reporting bias occurs when outcomes are selectively reported, when negative results are reported in a positive manner, and when conclusions are not supported by the results.
Examples of outcome reporting bias are widespread in health care. One study of RCTs found that primary outcomes data had been newly introduced, omitted, or changed in more than 60% of comparisons between publications and study protocols. 47 As ST evaluations do not require registration, inferences must be drawn from examination of the published studies. In the case of the 2 pharmaceutical industry-sponsored studies by Mark et al., it is unclear why the authors chose all-cause medical utilization and expenditures as the exclusive endpoints in one study and the primary endpoints in another. 11,12 The inclusion of total prescription drug spending rather than disease-related drug spending as the primary measure of program savings is equally puzzling, since there is no compelling explanation for why ST would affect drug expenditures that are unrelated to the disease.
Another potential indication of outcome reporting bias is that in their analysis of an ST for antidepressant drugs, Mark et al. reported that mental health-related inpatient admissions and ER visits were higher in the ST group relative to the comparison group following implementation, but they failed to mention that these results were not statistically significant. 11 In fact, the results did not even approach statistical significance despite the large sample size. Against this backdrop, 2 recommendations are provided to reduce the potential for bias in ST evaluations.

Independently funded evaluations.
While studies of Medicaid populations have received significant public funding, commercial program studies examining medical spending have been funded exclusively by pharmaceutical manufacturers. Bias in studies funded by pharmaceutical manufacturers has been reported for clinical trial efficacy and cost-effectiveness